A web content mining application for detecting relevant pages using Jaccard similarity

نویسندگان

چکیده

<span lang="EN-US">The tremendous growth in the availability of enormous text data from a variety sources creates slew concerns and obstacles to discovering meaningful information. This advancement technology digital realm has resulted dispersion texts over millions web sites. Unstructured are densely packed with textual The discovery valuable intriguing relationships unstructured demands more computer processing. So, mining developed into an attractive area study for obtaining organized useful data. One purposes this research is discuss pre-processing automobile marketing domains order create structured database. Regular expressions were used extract vehicle advertisements, resulting well-organized We manually develop unique rule-based ways extracting pages. As result information retrieved these systematic search certain noteworthy qualities performed. There numerous approaches query recommendation, it vital understand which one should be employed. Additionally, attempts determine optimal value similarity suggestions based on user-supplied parameters by comparing MySQL pattern matching Jaccard similarity.</span>

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cleaning Web Pages for Effective Web Content Mining

Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-based search engines and taxonomic web page categorization applications). Noise on web pages are irrelevant to the main content on the web pages being mined, and include advertisements, navigation bar, and copyright noti...

متن کامل

Similarity based Dynamic Web Data Extraction and Integration System from Search Engine Result Pages for Web Content Mining

There is an explosive growth of information in the World Wide Web thus posing a challenge to Web users to extract essential knowledge from the Web. Search engines help us to narrow down the search in the form of Search Engine Result Pages (SERP). Web Content Mining is one of the techniques that help users to extract useful information from these SERPs. In this paper, we propose two similarity b...

متن کامل

Identifying Spam Web Pages Based on Content Similarity

The Web provides its users with abundant information. Unfortunately, when a Web search is performed, both users and search engines are faced with an annoying problem: the presence of misleading Web pages, i.e., spam Web pages, that are ranked among legitimate Web pages. The mixed results downgrade the performance of search engines and frustrate users who are required to filter out useless infor...

متن کامل

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

mining users` navigation patterns for building web pages recommendation system

due to the quick growth of the world wide web, retrieval of useful informationfrom the internet for a particular web user or a group of users becomes verydifficult. recommendation systems using web usage mining help providing anadaptive web environment for the web users. this paper presents a novel approachfor page recommendation using fuzzy association rule mining algorithm. thismethod extract...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Electrical and Computer Engineering

سال: 2022

ISSN: ['2088-8708']

DOI: https://doi.org/10.11591/ijece.v12i6.pp6461-6471